Using Zero-Resource Spoken Term Discovery for Ranked Retrieval

نویسندگان

  • Jerome White
  • Douglas W. Oard
  • Aren Jansen
  • Jiaul H. Paik
  • Rashmi Sankepally
چکیده

Research on ranked retrieval of spoken content has assumed the existence of some automated (word or phonetic) transcription. Recently, however, methods have been demonstrated for matching spoken terms to spoken content without the need for language-tuned transcription. This paper describes the first application of such techniques to ranked retrieval, evaluated using a newly created test collection. Both the queries and the collection to be searched are based on Gujarati produced naturally by native speakers; relevance assessment was performed by other native speakers of Gujarati. Ranked retrieval is based on fast acoustic matching that identifies a deeply nested set of matching speech regions, coupled with ways of combining evidence from those matching regions. Results indicate that the resulting ranked lists may be useful for some practical similarity-based ranking tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining Evidence from Unconstrained Spoken Term Frequency Estimation for Improved Speech Retrieval

Title of dissertation: Combining Evidence from Unconstrained Spoken Term Frequency Estimation for Improved Speech Retrieval J. Scott Olsson, Doctor of Philosophy, 2008 Dissertation directed by: Associate Professor Douglas W. Oard College of Information Studies This dissertation considers the problem of information retrieval in speech. Today’s speech retrieval systems generally use a large vocab...

متن کامل

Zero-Resource Audio-Only Spoken Term Detection Based on a Combination of Template Matching Techniques

Spoken term detection is a well-known information retrieval task that seeks to extract contentful information from audio by locating occurrences of known query words of interest. This paper describes a zero-resource approach to such task based on pattern matching of spoken term queries at the acoustic level. The template matching module comprises the cascade of a segmental variant of dynamic ti...

متن کامل

The Zero Resource Speech Challenge 2015: Proposed Approaches and Results

This paper reports on the results of the Zero Resource Speech Challenge 2015, the first unified benchmark for zero resource speech technology, which aims at the unsupervised discovery of subword and word units from raw speech. This paper discusses the motivation for the challenge, its data sets, tasks and baseline systems. We outline the ideas behind the systems that were submitted for the two ...

متن کامل

Towards spoken term discovery at scale with zero resources

The spoken term discovery task takes speech as input and identifies terms of possible interest. The challenge is to perform this task efficiently on large amounts of speech with zero resources (no training data and no dictionaries), where we must fall back to more basic properties of language. We find that long (∼ 1 s) repetitions tend to be contentful phrases (e.g. University of Pennsylvania) ...

متن کامل

Data-driven Posterior Features for Low Resource Speech Recognition Applications

In low resource settings, with very few hours of training data, state-of-the-art speech recognition systems that require large amounts of task specific training data perform very poorly. We address this issue by building data-driven speech recognition front-ends on significant amounts of task independent data from different languages and genres collected in similar acoustic conditions as the da...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015